NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Generalized data thinning using sufficient statistics

https://doi.org/10.1080/01621459.2024.2353948

Dharamshi, Ameer; Neufeld, Anna; Motwani, Keshav; Gao, Lucy L; Witten, Daniela; Bien, Jacob (May 2024, Journal of the American Statistical Association)

Full Text Available
Binned multinomial logistic regression for integrative cell-type annotation

https://doi.org/10.1214/23-AOAS1769

Motwani, Keshav; Bacher, Rhonda; Molstad, Aaron J (December 2023, The Annals of Applied Statistics)

ategorizing individual cells into one of many known cell-type categories, also known as cell-type annotation, is a critical step in the analysis of single-cell genomics data. The current process of annotation is time intensive and subjective, which has led to different studies describing cell types with labels of varying degrees of resolution. While supervised learning approaches have provided automated solutions to annotation, there remains a significant challenge in fitting a unified model for multiple datasets with inconsistent labels. In this article we propose a new multinomial logistic regression estimator which can be used to model cell-type probabilities by integrating multiple datasets with labels of varying resolution. To compute our estimator, we solve a nonconvex optimization problem using a blockwise proximal gradient descent algorithm. We show through simulation studies that our approach estimates cell-type probabilities more accurately than competitors in a wide variety of scenarios. We apply our method to 10 single-cell RNA-seq datasets and demonstrate its utility in predicting fine resolution cell-type labels on unlabeled data as well as refining cell-type labels on data with existing coarse resolution annotations. Finally, we demonstrate that our method can lead to novel scientific insights in the context of a differential expression analysis comparing peripheral blood gene expression before and after treatment with interferon-β. An R package implementing the method is available in the Supplementary Material and at https://github.com/keshav-motwani/IBMR, and the collection of datasets we analyze is available at https://github.com/keshav-motwani/AnnotatedPBMC.
more » « less
Full Text Available
Tractometry of the Human Connectome Project: resources and insights

https://doi.org/10.3389/fnins.2024.1389680

Kruper, John; Hagen, McKenzie P; Rheault, François; Crane, Isaac; Gilmore, Asa; Narayan, Manjari; Motwani, Keshav; Lila, Eardi; Rorden, Chris; Yeatman, Jason D; et al (June 2024, Frontiers in Neuroscience)

The Human Connectome Project (HCP) has become a keystone dataset in human neuroscience, with a plethora of important applications in advancing brain imaging methods and an understanding of the human brain. We focused on tractometry of HCP diffusion-weighted MRI (dMRI) data. We used an open-source software library (pyAFQ;https://yeatmanlab.github.io/pyAFQ) to perform probabilistic tractography and delineate the major white matter pathways in the HCP subjects that have a complete dMRI acquisition (n = 1,041). We used diffusion kurtosis imaging (DKI) to model white matter microstructure in each voxel of the white matter, and extracted tract profiles of DKI-derived tissue properties along the length of the tracts. We explored the empirical properties of the data: first, we assessed the heritability of DKI tissue properties using the known genetic linkage of the large number of twin pairs sampled in HCP. Second, we tested the ability of tractometry to serve as the basis for predictive models of individual characteristics (e.g., age, crystallized/fluid intelligence, reading ability, etc.), compared to local connectome features. To facilitate the exploration of the dataset we created a new web-based visualization tool and use this tool to visualize the data in the HCP tractometry dataset. Finally, we used the HCP dataset as a test-bed for a new technological innovation: the TRX file-format for representation of dMRI-based streamlines. We released the processing outputs and tract profiles as a publicly available data resource through the AWS Open Data program's Open Neurodata repository. We found heritability as high as 0.9 for DKI-based metrics in some brain pathways. We also found that tractometry extracts as much useful information about individual differences as the local connectome method. We released a new web-based visualization tool for tractometry --- “Tractoscope” (https://nrdg.github.io/tractoscope). We found that the TRX files require considerably less disk space - a crucial attribute for large datasets like HCP. In addition, TRX incorporates a specification for grouping streamlines, further simplifying tractometry analysis.
more » « less
Full Text Available
Multiresolution Categorical Regression for Interpretable Cell-Type Annotation

https://doi.org/10.1111/biom.13926

Molstad, Aaron J.; Motwani, Keshav (October 2023, Biometrics)

Abstract In many categorical response regression applications, the response categories admit a multiresolution structure. That is, subsets of the response categories may naturally be combined into coarser response categories. In such applications, practitioners are often interested in estimating the resolution at which a predictor affects the response category probabilities. In this paper, we propose a method for fitting the multinomial logistic regression model in high dimensions that addresses this problem in a unified and data-driven way. Our method allows practitioners to identify which predictors distinguish between coarse categories but not fine categories, which predictors distinguish between fine categories, and which predictors are irrelevant. For model fitting, we propose a scalable algorithm that can be applied when the coarse categories are defined by either overlapping or nonoverlapping sets of fine categories. Statistical properties of our method reveal that it can take advantage of this multiresolution structure in a way existing estimators cannot. We use our method to model cell-type probabilities as a function of a cell's gene expression profile (i.e., cell-type annotation). Our fitted model provides novel biological insights which may be useful for future automated and manual cell-type annotation methodology.
more » « less

Search for: All records